Taking a quick view of the data
# For the 2019 data set
dim(data2019)
## [1] 39 17
head(data2019, 10)
## # A tibble: 10 × 17
## ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 Loca… Resi… <NA> <NA> <NA> <NA> <NA> Solar <NA> <NA> <NA> <NA> "Non…
## 3 <NA> Dail… MWh <NA> <NA> Cust… <NA> Numb… <NA> "Gen… <NA> Ener… "MWh"
## 4 <NA> <NA> Gene… Off … Total Off … Total Res Non-… "Res… Non-… <NA> <NA>
## 5 BAYS… 12.4… 2756… 2947… 3051… 1552… 6697… 2842 190 "917… 3900… 5848… "136…
## 6 BURW… 12.8… 6442… 3814… 6824… 2241… 1455… 684 35 "222… 631.… 1434… "414…
## 7 CANA… 13.4… 1784… 8682… 1870… 5259… 3805… 1601 85 "588… 3572… 3427… "678…
## 8 CANT… 15.2… 6444… 8135… 7258… 3701… 1299… 9947 561 "320… 1613… 2038… "231…
## 9 CENT… 16.4… 7375… 1699… 9074… 8133… 1510… 21015 674 "754… 1918… 5525… "222…
## 10 CESS… 18.9… 1451… 2206… 1671… 1076… 2412… 4329 204 "194… 4744… 1192… "431…
## # … with 4 more variables: ...14 <chr>, ...15 <chr>, ...16 <chr>, ...17 <chr>
# For the 2020 data set
dim(data2020)
## [1] 41 17
head(data2020, 10)
## # A tibble: 10 × 17
## ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Regi… Loca… Resi… <NA> <NA> <NA> <NA> <NA> Solar <NA> <NA> <NA> <NA>
## 2 <NA> <NA> Dail… MWh <NA> <NA> Cust… <NA> Numb… <NA> "Gen… <NA> Ener…
## 3 <NA> <NA> <NA> Gene… Off … Total Off … Total Res Non-… "Res… Non-… <NA>
## 4 Sydn… BAYS… 12.4 2778… 28618 3064… 15314 67909 3134 212 "113… 4871 7584
## 5 <NA> BURW… 12.6 63676 3638 67314 2196 14600 753 38 "273… 674 1696
## 6 <NA> CANA… 13.3 1778… 8305 1861… 5075 38223 1805 99 "729… 3950 4406
## 7 <NA> CANT… 15.1 6416… 79006 7206… 36476 1310… 10900 631 "389… 17862 26313
## 8 <NA> CUMB… 13.1 1019… 8697 1106… 4027 23149 1908 129 "619… 4167 4519
## 9 <NA> GEOR… 14.6 2699… 39761 3097… 19182 58079 4351 181 "159… 3850 9931
## 10 <NA> HORN… 18.2 3013… 49437 3507… 20861 52676 6789 180 "282… 4519 16246
## # … with 4 more variables: ...14 <chr>, ...15 <chr>, ...16 <chr>, ...17 <chr>
# For the 2023 data set
dim(data2023)
## [1] 41 17
head(data2023, 10)
## # A tibble: 10 × 17
## ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Regi… Loca… Resi… <NA> <NA> <NA> <NA> <NA> Solar <NA> <NA> <NA> <NA>
## 2 <NA> <NA> Dail… MWh <NA> <NA> Cust… <NA> Numb… <NA> "Gen… <NA> Ener…
## 3 <NA> <NA> <NA> Gene… Off … Total Off … Total Res Non-… "Res… Non-… <NA>
## 4 Sydn… BAYS… 12.6… 2882… 2733… 3155… 14792 68610 4381 279 "211… 6862… 1605…
## 5 <NA> BURW… 12.4… 6441… 3376… 6779… 2101 14861 995 52 "469… 988.… 3398…
## 6 <NA> CANA… 13.5… 1830… 7881… 1909… 4765 38740 2490 117 "131… 4284… 9463…
## 7 <NA> CANT… 15.0… 6567… 7574… 7325… 35055 1334… 14679 823 "696… 2682… 5498…
## 8 <NA> CUMB… 12.8… 1012… 8201… 1094… 3853 23410 2550 179 "115… 6067… 9777…
## 9 <NA> GEOR… 14.6… 2772… 3853… 3157… 18522 59056 6046 234 "307… 6387… 2171…
## 10 <NA> HORN… 18.4… 3095… 4985… 3594… 20435 53315 9222 222 "495… 6107… 3195…
## # … with 4 more variables: ...14 <chr>, ...15 <chr>, ...16 <chr>, ...17 <chr>
This data source is from Ausgrid, which is a large electricity provider that has more than 1.7 million customers in Sydney, Australia.Although this is a electricty provider that is run by a company, however, it must be subjected to a range of regulations and guidelines to ensure the accuracy of their data, including the Australian Energy Regulator(AER), according to the Australian energy policies.
Some useful functions
wrap.it <- function(x, len)
{
sapply(x, function(y) paste(strwrap(y, len),
collapse = "\n"),
USE.NAMES = FALSE)
}
wrap.labels <- function(x, len)
{
if (is.list(x))
{
lapply(x, wrap.it, len)
} else {
wrap.it(x, len)
}
}
To get an idea of the new data, we will do a quick bargraph of the ‘general supply’ columns from the excel file. (It is with several trial and error, that we have found out the original excel file is not suitable for R data analysis, therefore manual excel operation was done to extract it out into a form which was useful.)
library(readxl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
options(scipen = 999)
# Read in files
extracted_data <- read_excel("data/Electricity_consumption(extracted).xlsx")
extracted_data <- arrange(extracted_data)
# Setting up the output graph
par(mar = c(5, 8, 0, 2), mgp = c(4, 1, 0))
# The 2019 Dataset
values2019 <- extracted_data$`2019-R`
suburbs2019 <- extracted_data$H2019
barplot(values2019, names.arg=wrap.labels(rev(suburbs2019), 20),
col="red", xlab=" Electricity consumption/MWh", horiz=T, las=2, cex.names = 0.7)
# The 2020 Dataset
values2020 <- extracted_data$`2020-R`
suburbs2020 <- extracted_data$H2020
barplot(values2020, names.arg=wrap.labels(rev(suburbs2020), 20),
col="green", xlab=" Electricity consumption/MWh", horiz=T, las=2, cex.names = 0.7)
# The 2021 Dataset
values2021 <- extracted_data$`2021-R`
suburbs2021 <- extracted_data$H2021
barplot(values2021, names.arg=wrap.labels(rev(suburbs2021), 20),
col="blue", xlab=" Electricity consumption/MWh", horiz=T, las=2, cex.names = 0.7)
The impact of the covid-19 pandemic on electricity consumption in the different suburbs of Sydney As covid-19 strikes Australia, our reliance on digital equipments especially remote streaming apps has increased drastically.So, the hypothesis is the usage of digital equipments has led to a increase in electricity consumption. Is there an electricity usage rise during the pandemic?
From the IDA, it seems like the data itself is not distinctive enough to be able figure out the difference from first site.
sdev1 = mean(values2019)
sdev2 = mean(values2020)
sdev3 = mean(values2021)
ts = c(sdev1, sdev2, sdev3)
barplot(ts, names.arg=c('2019', '2020', '2021'))
It seems like the hypothesis has been falsified, there was a slight
electric usage increase, but not during pandemic, instead it was after
the pandemic in 2021 However, it is also considered that as some values
from the 2022-2023 dataset is missing, part of this could also
contributed to this phenomenon.
As there was a lot of changes to the distribution of population before and after the pandemic, the impact of the change was investigated. To get an general idea of which areas had the highest amount of electricity consumption, normal distribution graph was made. (We’ll use the latest complete dataset, which was the 2021 dataset) General Supply
distribution <- function(title, name, values){
df <- data.frame(label=name, value=values)
df_sorted <- df[order(df$value, decreasing = FALSE),]
mean_val <- mean(values)
sd_val <- sd(values)
par(mar = c(5, 6, 2, 2), mgp = c(4, 1, 0), fig = c(0, 1, 0, 1), fin = c(6, 4))
barplot(df_sorted$value, names.arg=wrap.labels(df_sorted$label, 20),
col="blue", xlab=title, horiz=T, las=2, cex.names = 0.4, , cex.lab=0.7, cex.axis=0.7)
hist(values, breaks=10, freq=FALSE, col="grey", xlab="Values", main="Normal Distribution")
curve(dnorm(x, mean=mean_val, sd=sd_val), add=TRUE, col="blue")
abline(v = mean_val, col = "red", lwd = 2)
}
distribution("Electricity Consumption/MWh", suburbs2021, values2021)
‘Non-residential small-medium sites’
values2021M <- extracted_data$`2021-M`
distribution("Electricity Consumption/MWh",suburbs2021, values2021M)
‘Non-residential large sites’
values2021L <- extracted_data$`2021-L`
distribution("Electricity Consumption/MWh",suburbs2021, values2021L)
Finally, to make a conclusion, a pie chart is made
par(mar = c(5, 6, 2, 2), mgp = c(4, 1, 0), fig = c(0, 1, 0, 1), fin = c(6, 4))
sum_2021G <- sum(values2021)
sum_2021M <- sum(values2021M)
sum_2021L <- sum(values2021L)
pie_values <- c(sum_2021G, sum_2021M, sum_2021L)
pie_labels <- c("General Supply", "Medium Sites", "Large Sites")
pie(pie_values, labels = pie_labels)
As can see, this shows a nuanced side of the distribution of electricity
consumption by Ausgrid, as Ausgrid is the largest distributor of
electricity on the east coast, there’s a high chance that this
represents the total electricity consumption of NSW. If we’re talking
about large areas like shopping malls, business skyscrapers or tourist
attractions, than population does matter, in fact a lot, so places where
there tends to be large flow of population, like Sydney(The normal
distribution graph shows how skewed this situation is), actually uses
the most amount electrcity. However, if we’re talking about residential
supplies, which corresponds to the ‘General Supply’ column of the data
source.Than population matters less, because with residential usage, the
amount of electricty consumption depends on the size of the housing,
which is why I guess Central Coast has the most amount of electricty
(Note: Again, just to be cautious, it is considered that Ausgrid
could’ve have a target customer base, which could explain part of this
situation as well)
@misc{jrara_17AD, title={plot - R barplot: wrapping long text labels?}, url={https://stackoverflow.com/questions/20241065/r-barplot-wrapping-long-text-labels}, journal={Stack Overflow}, author={jrara}, year={17AD}, month={Nov} }